Statistical Analysis of Pokemon¶

EDA, Data Visualization, and Clustering Analysis¶

Like many men in their late twenties and early thirties, I grew up spellbound by Pokemon. I played the games, watched the show, collected the cards, read the comic books, and even made some of my own with my best friend. As an adult, it's still a guilty pleasure to train and battle in the latest games. I take an admittedly more rigorous approach to the game now, maxing out EVs, IVs, and making sure that my team is strong in the current meta. I enjoy using sites like Smogon and Bulbapedia but while they are very informative, I have yet to find a strong statistical analysis of Pokemon across the nine generations. This notebook aims to add to that conversation. I will seek to answer a few questions:

  1. What can a brief exploratory data analysis tell us about Pokemon?
  2. How have the stats assigned to Pokemon changed throughout the years?
  3. Are there clusters of similar Pokemon?

I start by importing my libraries and reading in data from each generation, then combining them into one dataframe. To gather this data I used Pokebase, a Python wrapper to PokeAPI. That notebook is also featured on this page.

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns

gen1 = pd.read_csv('data/gen1.csv')
gen1['gen'] = 1
gen2 = pd.read_csv('data/gen2.csv')
gen2['gen'] = 2
gen3 = pd.read_csv('data/gen3.csv')
gen3['gen'] = 3
gen4 = pd.read_csv('data/gen4.csv')
gen4['gen'] = 4
gen5 = pd.read_csv('data/gen5.csv')
gen5['gen'] = 5
gen6 = pd.read_csv('data/gen6.csv')
gen6['gen'] = 6
gen7 = pd.read_csv('data/gen7.csv')
gen7['gen'] = 7
gen8 = pd.read_csv('data/gen8.csv')
gen8['gen'] = 8
gen9 = pd.read_csv('data/gen9.csv')
gen9['gen'] = 9

all_gens = pd.concat([gen1, gen2, gen3, gen4, gen5, gen6, gen7, gen8, gen9])

A quick look at the summary stats of the dataframe doesn't tell us a ton. There are 1008 total Pokemon, and the mean value for each individual statistic is somewhere in the 70-77 range, with fairly similar values for std, min, and quantiles. Max shows the only real difference between the statistics, showing evidence of some outliers in certain statistics, HP being a particularly high outlier. Height and weight have fairly massive spreads, ranging from extremely small pokemon to huge ones.

In [4]:
all_gens.describe()
Out[4]:
Unnamed: 0 id weight height hp attack defense spa spd speed gen
count 1008.000000 1008.000000 1008.000000 1008.000000 1008.000000 1008.000000 1008.000000 1008.000000 1008.000000 1008.000000 1008.000000
mean 58.807540 504.500000 659.926587 12.050595 69.872024 77.331349 72.089286 69.682540 69.893849 66.897817 4.669643
std 37.815848 291.128837 1200.911869 12.456096 26.655113 29.848464 29.173010 29.518284 26.682503 28.702156 2.596497
min 0.000000 1.000000 1.000000 1.000000 1.000000 5.000000 5.000000 10.000000 20.000000 5.000000 1.000000
25% 27.750000 252.750000 85.000000 5.000000 50.000000 55.000000 50.000000 45.000000 50.000000 45.000000 3.000000
50% 55.500000 504.500000 280.000000 10.000000 67.000000 75.000000 69.000000 65.000000 65.000000 65.000000 5.000000
75% 85.000000 756.250000 680.500000 15.000000 83.000000 100.000000 90.000000 90.000000 85.000000 87.000000 7.000000
max 155.000000 1008.000000 9999.000000 200.000000 255.000000 181.000000 230.000000 173.000000 230.000000 200.000000 9.000000

The second step in my EDA was to take a look at how all the stats interact with each other. As expected based on the values seen in the description table, height and weight are very skewed, with most pokemon being small and a select few being very large.

One intersesting thing to note in this plot: there is a clear hotspot showing a 1:1 trend of Defense (defense) and Special Defense (spd, not to be confused with speed), and to a lesser extent, Attack (attack) and Special Attack (spa). I expected Speed and Defense/Special Defense to have a negative trend and on outlier points that seems to be the case, however, on the whole there doesn't appear to be much of a relationship.

In [5]:
stats = ["hp","attack","defense","spa","spd","speed"]
all_gens['stat_total'] = all_gens.loc[:,stats].sum(1)
all_gens['primary_type'] = all_gens['primary_type'].astype('category') 
all_gens['secondary_type'] = all_gens['secondary_type'].astype('category')

all_num = ["height","weight","hp","attack","defense","spa","spd","speed","stat_total"]
sns.pairplot(all_gens[all_num], plot_kws={'alpha':0.1})
Out[5]:
<seaborn.axisgrid.PairGrid at 0x172fb067550>

Next, I wanted to examine the correlation between different stats. There is nothing particularly surprising here. height and weight have a correlation of 0.63, the highest of any individual stats together. stat_total is fairly highly correlated with all of the individual stats but that is to be expected- any mon with a particularly high or low value for any stat probably has a high or low stat_total.

In [8]:
corr_mat = all_gens[all_num].corr()
sns.heatmap(corr_mat, annot=True).set_title('Correlation of Statistics')
Out[8]:
Text(0.5, 1.0, 'Correlation of Statistics')

In addition to these statistics, each pokemon has a Primary Type (primary_type). The plot below shows the distribution of Primary Types across all generations. Water is the most common Primary Type, while Flying is the least common Primary Type. This is somewhat unexpected, as there are many Flying pokemon in each game. It will be clear in the following plots why this is the case. Another interesting finding here is that of the three "Starter" types (Water, Grass, and Fire), Fire is by far the least common. Nabbing a powerful Fire-type at the beginning of the game will fill a harder-to-plug gap in type coverage than picking either of the other two... as if we needed a scientific reason to pick Charizard.

In [12]:
ax = sns.countplot(data=all_gens, x = 'primary_type', order = all_gens['primary_type'].value_counts().index, palette=['#3399FF','#A9A896', '#77CC55','#AABB22', '#FF4422','#FF5599','#FFCC33','#BBAA66','#775544','#BB5544','#DDBB55','#AA5599','#7766EE','#6666BB','#AAAABB','#66CCFF','#EE99EE','#8899FF'])
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right", fontsize=8)
ax.set_title('Primary Type Distribution Across Generations')
ax.plot()
Out[12]:
[]

To see how the type distribution changed across generations, I generated a crosstab of primary_type and gen. The results for the top four (Bug, Normal, Water, and Grass) seem to hold pretty consistently, with others being particularly high or low in certain generations. One interesting point is that there were 14 Poison-type pokemon introduced in the first generation, and no more than 6 in any subsequent generation.

In [17]:
pd.crosstab(all_gens['primary_type'], all_gens['gen']).sort_values(5, ascending=False)
Out[17]:
gen 1 2 3 4 5 6 7 8 9
primary_type
bug 12 10 12 8 18 3 9 4 7
normal 22 15 18 17 17 4 12 5 7
water 28 18 24 13 17 5 9 10 9
grass 12 9 12 13 15 5 12 8 11
psychic 8 7 8 7 14 3 6 5 2
dark 0 5 4 3 13 3 1 8 8
ground 8 3 6 4 9 0 2 4 4
fire 12 8 6 5 8 8 6 5 7
electric 9 6 4 7 7 3 4 9 9
fighting 7 2 4 2 7 3 4 8 3
dragon 3 0 7 3 7 4 3 4 6
ice 2 4 6 3 6 2 0 5 3
rock 9 4 8 6 6 8 5 4 7
ghost 3 1 4 6 5 4 4 4 4
steel 0 2 9 3 4 4 4 4 4
poison 14 1 3 6 2 2 6 1 3
flying 0 0 0 0 1 2 0 4 2
fairy 2 5 0 1 0 9 1 4 7

The chart below shows the distributuion of secondary types across generations. The most common seondary type is None, followed by Flying, which is more than twice as common as the next-highest value. Flying is not a common Primary type, but it is by far the most common Secondary type.

In [20]:
ax = sns.countplot(data=all_gens, x = 'secondary_type', order = all_gens['secondary_type'].value_counts().index, palette = ['#070707', '#8899FF', '#AA5599', '#FF5599', '#DDBB55', '#EE99EE','#BB5544','#AAAABB','#7766EE','#6666BB','#77CC55','#775544','#3399FF','#66CCFF','#DDBB55','#FF4422','#A9A896','#FFCC33','#AABB22'])
ax.set_xticklabels(ax.get_xticklabels(), rotation=40, ha="right", fontsize=8)
ax.set_title('Secondary Type Distribution Across Generations')
ax.plot()
Out[20]:
[]

How frequently do the types co-occur? As we see below, there are a few pairings that are far more common than others. Normal/Flying, Grass/Poison, Bug/Flying, and Bug/Poison are fairly common pairs.

In [23]:
plt_df = all_gens.loc[all_gens['secondary_type'] != 'none']
ctab = pd.crosstab(index=plt_df['primary_type'], columns=plt_df['secondary_type'])
sns.heatmap(ctab, robust = False).set_title('Co-occurance of Types')
Out[23]:
Text(0.5, 1.0, 'Co-occurance of Types')

We've examined how type distribution has changed across generations, but equally interesting (to me) is how stat distribution has changed across generations. How has the meta-game changed through the years? Have the games suffered from stat-creep? The plot below shows a violin plot of stat_total with gen. We can see that the median stat_total does appear to have increased since the original three generations and the distributions are more top-heavy. However, there is one critical point missing from this plot: we are only counting pokemon introduced in each generation, not present in the game. My uneducated guess is that largely, there are already enough "filler" pokemon found in early game or just not used at all. Recent development has focused on introducing new pokemon that are competitively viable or at minimum, viable throughout the campaign.

In [27]:
sns.violinplot( x = all_gens['gen'], y = all_gens['stat_total'], palette = ['#AEB1BA']).set_title('Total Stat Distribution Across Generations')
Out[27]:
Text(0.5, 1.0, 'Total Stat Distribution Across Generations')

Cluster Analysis¶

Having played many of the games, I know that there are certain archetypes of pokemon found throughout the games. Examples include "Starter Pokemon," "Regional Bird," "Regional Mammal/Rat," and "Pseduolegenaries." Competitively, there are also archetypes like "Special Attack Sweeper" and "Defensive Wall." This cluster analysis attempts to programatically identify those archetypes.

To begin, I import the packages, drop irrelevant columns, scale my data, and run kmeans to see how many clusters will produce a well-fit model without overfitting. There seems to be a nice drop-off at about 50 clusters.

In [28]:
from sklearn.cluster import KMeans 
from sklearn import metrics
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

X = all_gens.drop(['name', 'id', 'Unnamed: 0', 'gen'],axis = 1)
X = pd.get_dummies(data = X, columns = ['primary_type','secondary_type'])
X = X.drop(['primary_type_normal','secondary_type_none'], axis = 1)
scaler = StandardScaler()
X_trans = scaler.fit_transform(X)

max_clust = 100
scores = []
clust = []
OMP_NUM_THREADS = 4
for i in range(1, max_clust):
    kmeans = KMeans(n_clusters=i, random_state=42, n_init="auto").fit(X_trans)
    score = kmeans.inertia_
    scores.append(score)
    clust.append(i)
    
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
In [29]:
sns.scatterplot(x = clust, y = scores).set_title('Intertia in KMeans Runs of Varying K')
Out[29]:
Text(0.5, 1.0, 'Intertia in KMeans Runs of Varying K')

Identifying clusters is great, but to make the results tractable, I want to find the centroid (most representative) pokemon for each. The code below runs the final iteration kf KMeans with 50 clusters, identifies the centroids, and prints the first few results. We will dig into these more below. On first look, it seems we may have found some of the clusters I hypothesized above: Grass-starters, Water Staters, and Regional Birds.

In [30]:
kmeans_fin = KMeans(n_clusters=50, random_state=42, n_init="auto").fit(X_trans)
labels = kmeans_fin.predict(X_trans)

all_gens['cluster'] = labels

centers = kmeans_fin.cluster_centers_

centroids = []
for i,c in enumerate(centers):
    closest = 100000
    closest_j = 9999999
    for j, v in enumerate(X_trans):

        dist = np.linalg.norm(v - c)
        if dist < closest:
            closest = dist
            closest_j = j
    centroids.append((i,closest_j))
    
import re
centroid_names = []
for clust, idx in centroids:
    mon = all_gens.loc[all_gens.id == idx+1] #matching by dexID, not index, so add one
    n =  re.search("(\d?)([a-z]+)\s", str(mon['name']))
    name = n.group()[:-1]
    centroid_names.append((clust,name))
    
centroid_names[0:5]      
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
Out[30]:
[(0, 'oranguru'),
 (1, 'snivy'),
 (2, 'quaxwell'),
 (3, 'ivysaur'),
 (4, 'pidgeotto')]

Taking a deeper dive into cluster 4, it seems we have found our cluster of regional birds. Interestingly, regional bugs that ere Bug/Flying seem to have been lumped in to this cluster. Cluster 4 also drives home a point found in nearly all the subsequent clusters: because turning typing into one-hot columns results in so many features compared to the stats, these clusters are largely driven by typing.

In [175]:
all_gens.loc[all_gens['cluster'] == 4]
Out[175]:
Unnamed: 0 id name weight height primary_type secondary_type hp attack defense spa spd speed gen stat_total cluster
11 11 12 butterfree 320 11 bug flying 60 45 50 90 80 70 1 395 4
15 15 16 pidgey 18 3 normal flying 40 45 40 35 35 56 1 251 4
16 16 17 pidgeotto 300 11 normal flying 63 60 55 50 50 71 1 349 4
17 17 18 pidgeot 395 15 normal flying 83 80 75 70 70 101 1 479 4
20 20 21 spearow 20 3 normal flying 40 60 30 31 31 70 1 262 4
21 21 22 fearow 380 12 normal flying 65 90 65 61 61 100 1 442 4
82 82 83 farfetchd 150 8 normal flying 52 90 55 58 62 60 1 377 4
83 83 84 doduo 392 14 normal flying 35 85 45 35 35 75 1 310 4
84 84 85 dodrio 852 18 normal flying 60 110 70 60 60 110 1 470 4
122 122 123 scyther 560 15 bug flying 70 110 80 55 80 105 1 500 4
11 11 163 hoothoot 212 7 normal flying 60 30 30 36 56 50 2 262 4
12 12 164 noctowl 408 16 normal flying 100 50 50 86 96 70 2 452 4
14 14 166 ledian 356 14 bug flying 55 35 50 55 110 85 2 390 4
41 41 193 yanma 380 12 bug flying 65 65 45 75 45 95 2 390 4
74 74 226 mantine 2200 21 water flying 85 40 70 80 140 70 2 485 4
15 15 267 beautifly 284 10 bug flying 60 70 50 100 50 65 3 395 4
24 24 276 taillow 23 3 normal flying 40 55 30 30 30 85 3 270 4
25 25 277 swellow 198 7 normal flying 60 85 60 75 50 125 3 455 4
26 26 278 wingull 95 6 water flying 40 30 30 55 30 85 3 270 4
27 27 279 pelipper 280 12 water flying 60 50 100 95 70 65 3 440 4
32 32 284 masquerain 36 8 bug flying 70 60 62 100 82 80 3 454 4
39 39 291 ninjask 120 8 bug flying 61 90 45 50 50 160 3 456 4
81 81 333 swablu 12 4 normal flying 45 40 60 40 75 50 3 310 4
9 9 396 starly 20 3 normal flying 40 55 30 30 30 60 4 245 4
10 10 397 staravia 155 6 normal flying 55 75 50 40 40 80 4 340 4
11 11 398 staraptor 249 12 normal flying 85 120 70 50 60 100 4 485 4
27 27 414 mothim 233 9 bug flying 70 94 50 94 50 66 4 424 4
29 29 416 vespiquen 385 12 bug flying 70 80 102 80 102 40 4 474 4
54 54 441 chatot 19 5 normal flying 76 65 45 92 42 91 4 411 4
71 71 458 mantyke 650 10 water flying 45 20 50 60 120 50 4 345 4
82 82 469 yanmega 515 19 bug flying 86 76 86 116 56 95 4 515 4
25 25 519 pidove 21 3 normal flying 50 55 50 36 30 43 5 264 4
26 26 520 tranquill 150 6 normal flying 62 77 62 50 42 65 5 358 4
27 27 521 unfezant 290 12 normal flying 80 115 80 65 55 93 5 488 4
86 86 580 ducklett 55 5 water flying 62 44 50 44 50 55 5 305 4
87 87 581 swanna 242 13 water flying 75 87 63 87 63 98 5 473 4
133 133 627 rufflet 105 5 normal flying 70 83 50 37 50 60 5 350 4
134 134 628 braviary 410 15 normal flying 100 123 75 57 75 80 5 510 4
11 11 661 fletchling 17 3 normal flying 45 50 43 40 38 62 6 278 4
16 16 666 vivillon 170 12 bug flying 80 52 50 90 50 89 6 411 4
9 9 731 pikipek 12 3 normal flying 35 75 30 30 30 65 7 265 4
10 10 732 trumbeak 148 6 normal flying 55 85 50 40 50 75 7 355 4
11 11 733 toucannon 260 11 normal flying 80 120 75 75 75 60 7 485 4
25 25 931 squawkabilly 24 6 normal flying 82 96 51 45 51 92 9 417 4

Cluster 45 is also an interesting cluster to examine. We see the successful grouping of Fire/Fighting type starters (at least the second and third evolutions of them) into one group, as well as some unexpected inclusions in Scraggy, Stufful, and Pawmo. Again, we see that all the pokemon in thus cluster have a Secondary Type of Fighting.

In [171]:
all_gens.loc[all_gens['cluster'] == 45]
Out[171]:
Unnamed: 0 id name weight height primary_type secondary_type hp attack defense spa spd speed gen stat_total cluster
4 4 256 combusken 195 9 fire fighting 60 85 60 85 60 55 3 405 45
5 5 257 blaziken 520 19 fire fighting 80 120 70 110 70 80 3 530 45
4 4 391 monferno 220 9 fire fighting 64 78 52 78 52 81 4 405 45
5 5 392 infernape 550 12 fire fighting 76 104 71 104 71 108 4 534 45
5 5 499 pignite 555 10 fire fighting 90 93 55 70 55 55 5 418 45
6 6 500 emboar 1500 16 fire fighting 110 123 65 100 65 65 5 528 45
65 65 559 scraggy 118 6 dark fighting 50 75 70 35 70 48 5 348 45
37 37 759 stufful 68 5 normal fighting 70 75 50 45 50 50 7 340 45
16 16 922 pawmo 65 4 electric fighting 60 75 40 50 40 85 9 350 45

Cluster 10 is less type-driven than the others and is primarily comprised of Legendary Pokemon, as well as Gyarados, Snorlax, Milotic, and Blissey, each powerful non-legendaries.

In [174]:
all_gens.loc[all_gens['cluster'] == 10]
Out[174]:
Unnamed: 0 id name weight height primary_type secondary_type hp attack defense spa spd speed gen stat_total cluster
129 129 130 gyarados 2350 65 water flying 95 125 79 60 100 81 1 540 10
142 142 143 snorlax 4600 21 normal none 160 110 65 65 110 30 1 540 10
90 90 242 blissey 468 15 normal none 255 10 10 75 135 55 2 540 10
93 93 245 suicune 1870 20 water none 100 75 115 90 115 85 2 580 10
97 97 249 lugia 2160 52 psychic flying 106 90 130 90 154 110 2 680 10
37 37 289 slaking 1305 20 normal none 150 160 100 95 65 100 3 670 10
98 98 350 milotic 1620 62 water none 95 60 79 100 125 81 3 540 10
130 130 382 kyogre 3520 45 water none 100 100 90 150 140 90 3 670 10
131 131 383 groudon 9500 35 ground none 100 150 140 100 90 90 3 670 10
99 99 486 regigigas 4200 37 normal none 110 160 110 80 110 100 4 670 10
106 106 493 arceus 3200 32 normal none 120 120 120 120 120 120 4 720 10
67 67 717 yveltal 2030 58 dark flying 126 131 95 131 98 99 6 680 10

Cluster 22 contains almost all of the Pseudolegenday Dragon-type Pokemon, as well as Latias and Latios. This cluster seems to have worked particularly well.

In [201]:
all_gens.loc[all_gens['cluster'] == 22]
Out[201]:
Unnamed: 0 id name weight height primary_type secondary_type hp attack defense spa spd speed gen stat_total cluster
146 146 147 dratini 33 18 dragon none 41 64 45 50 50 50 1 300 22
147 147 148 dragonair 165 40 dragon none 61 84 65 70 70 70 1 420 22
148 148 149 dragonite 2100 22 dragon flying 91 134 95 100 100 80 1 600 22
82 82 334 altaria 206 11 dragon flying 75 70 90 70 105 80 3 490 22
119 119 371 bagon 421 6 dragon none 45 75 60 40 30 50 3 300 22
120 120 372 shelgon 1105 11 dragon none 65 95 100 60 50 50 3 420 22
121 121 373 salamence 1026 15 dragon flying 95 135 80 110 80 100 3 600 22
128 128 380 latias 400 14 dragon psychic 80 80 90 110 130 110 3 600 22
129 129 381 latios 600 20 dragon psychic 80 90 80 130 110 110 3 600 22
132 132 384 rayquaza 2065 70 dragon flying 105 150 90 150 90 95 3 680 22
58 58 445 garchomp 950 19 dragon ground 108 130 95 80 85 102 4 600 22
116 116 610 axew 180 6 dragon none 46 87 60 30 40 57 5 320 22
117 117 611 fraxure 360 10 dragon none 66 117 70 40 50 67 5 410 22
118 118 612 haxorus 1055 18 dragon none 76 147 90 60 70 97 5 540 22
127 127 621 druddigon 1390 16 dragon none 77 120 90 60 90 48 5 485 22
54 54 704 goomy 28 3 dragon none 45 50 35 55 75 40 6 300 22
55 55 705 sliggoo 175 8 dragon none 68 75 53 83 113 60 6 452 22
56 56 706 goodra 1505 20 dragon none 90 100 70 110 150 80 6 600 22
68 68 718 zygarde-50 3050 50 dragon ground 108 100 121 81 95 95 6 600 22
60 60 782 jangmo-o 297 6 dragon none 45 55 65 45 45 45 7 300 22
85 85 895 regidrago 2000 21 dragon none 200 100 50 100 50 80 8 580 22

Instead of digging into every cluster, we can examine at scale. There are a number of clusters that are present across all generations- a good sign that there are in fact some common archetypes present in every generation.

In [220]:
sns.histplot(all_gens, x = "gen", y = "cluster", bins = 50)
Out[220]:
<AxesSubplot: xlabel='gen', ylabel='cluster'>

Another way to take a look at all clusters is to roll up some summary statistics.

In [235]:
from collections import Counter
import random

clust_info = []
for i in range(50):
    cluster = all_gens.loc[all_gens['cluster'] == i]
    c,n = centroid_names[i]
    clust_size = len(cluster['primary_type'])
    #get most common type
   
    ptype_counts = Counter(cluster['primary_type'])
    ptype_greatest = max(ptype_counts.values())
    mc_ptype = random.choice([item for item, count in ptype_counts.items() if count == ptype_greatest])
    ptype_sig = ptype_greatest / clust_size
    
    stype_counts = Counter(cluster['secondary_type'])
    stype_greatest = max(stype_counts.values())
    mc_stype = random.choice([item for item, count in stype_counts.items() if count == stype_greatest])
    stype_sig = stype_greatest / clust_size
                                     
    stat_total_mean = np.mean(cluster['stat_total'])
    stat_total_sd = np.std(cluster['stat_total'])
    clust_info.append((i,n,clust_size,mc_ptype,ptype_sig,mc_stype,stype_sig,stat_total_mean,stat_total_sd))
    
    
    
    

In the table below, we can see each cluster, its centroid, size, most common Primary/Secondary Type and the proportion of the cluster that shared that type, and the mean and standard deviation of stat_total. This table confirms what we saw above: these clusters are heavily weighted by the typing of the pokemon within. Nearly every column features either a Primary or Secondary type that it is entirely composed of and we see some wide spreads in the stat_total found in each.

In [238]:
clust_df = pd.DataFrame(clust_info, columns = ['cluster','centroid','size','most_common_ptype','ptype_prop','most_common_stype','stype_prop','stat_total_mean','stat_total_sd'])
clust_df
Out[238]:
cluster centroid size most_common_ptype ptype_prop most_common_stype stype_prop stat_total_mean stat_total_sd
0 0 oranguru 22 water 0.272727 psychic 1.000000 476.545455 68.077982
1 1 snivy 24 grass 1.000000 none 0.833333 288.000000 42.265037
2 2 quaxwell 66 water 1.000000 none 1.000000 393.136364 85.398511
3 3 ivysaur 33 grass 0.424242 poison 1.000000 386.515152 100.942264
4 4 pidgeotto 44 normal 0.613636 flying 1.000000 387.772727 83.611891
5 5 typhlosion 19 fire 1.000000 none 0.684211 528.157895 49.035179
6 6 glalie 24 ice 1.000000 none 0.666667 441.166667 105.763363
7 7 misdreavus 15 ghost 1.000000 none 0.666667 496.333333 38.141258
8 8 nidorino 23 poison 1.000000 none 0.695652 395.391304 89.074341
9 9 cherrim 29 grass 1.000000 none 0.896552 470.689655 51.684122
10 10 arceus 12 normal 0.416667 none 0.750000 625.000000 67.144124
11 11 bombirdier 1 flying 1.000000 dark 1.000000 485.000000 0.000000
12 12 slurpuff 28 fairy 1.000000 none 0.750000 430.928571 123.141303
13 13 scovillain 15 ghost 0.200000 fire 1.000000 464.066667 111.968130
14 14 carracosta 16 water 0.312500 rock 1.000000 445.625000 69.930211
15 15 sawsbuck 24 bug 0.250000 grass 1.000000 427.208333 103.020216
16 16 klawf 21 rock 1.000000 none 0.809524 412.857143 98.829476
17 17 sneasel 5 dark 0.600000 ice 1.000000 450.600000 84.884863
18 18 dolliv 13 grass 0.230769 normal 1.000000 430.076923 98.214413
19 19 plusle 40 electric 1.000000 none 0.875000 422.775000 113.673983
20 20 charjabug 10 bug 0.400000 electric 1.000000 477.200000 105.776935
21 21 chimecho 44 psychic 1.000000 none 0.886364 423.772727 116.524649
22 22 druddigon 21 dragon 1.000000 none 0.619048 485.571429 123.819597
23 23 metang 6 steel 1.000000 psychic 1.000000 453.333333 124.721913
24 24 skitty 30 normal 0.900000 none 1.000000 272.666667 48.617098
25 25 dewpider 9 rock 0.222222 bug 1.000000 377.000000 109.425774
26 26 mudbray 24 ground 1.000000 none 0.666667 401.958333 93.614671
27 27 wailord 4 water 0.500000 none 0.500000 572.500000 72.240916
28 28 machoke 32 fighting 1.000000 none 0.875000 414.031250 99.155334
29 29 bewear 21 bug 0.190476 fighting 1.000000 534.571429 48.190664
30 30 vigoroth 44 normal 0.954545 none 1.000000 463.136364 44.518429
31 31 fini 17 psychic 0.235294 fairy 1.000000 509.882353 61.187160
32 32 arctovish 12 water 0.333333 ice 1.000000 499.500000 84.500000
33 33 klang 19 steel 1.000000 none 0.526316 445.684211 94.256120
34 34 rellor 23 bug 1.000000 none 0.913043 264.043478 78.533559
35 35 brambleghast 20 grass 0.150000 ghost 1.000000 474.300000 119.663319
36 36 drampa 22 dark 0.181818 dragon 1.000000 533.272727 99.370831
37 37 mareanie 6 poison 0.500000 water 1.000000 338.166667 89.135509
38 38 scizor 25 bug 0.200000 steel 1.000000 496.040000 88.678963
39 39 marshtomp 18 water 0.500000 ground 1.000000 418.500000 105.054087
40 40 mightyena 20 dark 1.000000 none 0.650000 425.000000 107.864730
41 41 azurill 14 normal 0.285714 fairy 1.000000 296.785714 85.995995
42 42 sealeo 4 ice 1.000000 water 1.000000 450.000000 109.544512
43 43 graveler 6 rock 1.000000 ground 1.000000 380.000000 67.144124
44 44 corvisquire 8 flying 1.000000 none 0.375000 430.000000 121.114615
45 45 monferno 9 fire 0.666667 fighting 1.000000 428.666667 76.755022
46 46 sinistea 11 ghost 1.000000 none 0.545455 333.818182 60.809416
47 47 growlithe 24 fire 1.000000 none 0.958333 341.833333 51.906700
48 48 kabutops 9 rock 0.666667 water 1.000000 456.777778 90.950265
49 49 shiftry 22 grass 0.227273 dark 1.000000 467.227273 88.926390

Since we know that typing played a heavy role in finding these clusters, what if the same analysis was run without type? The intent of this run is to find competitive archetypes that would be found in stats.

In [249]:
X_js = all_gens.drop(['name','height','weight', 'id', 'Unnamed: 0', 'gen','primary_type','secondary_type'],axis = 1)
X_js_trans = scaler.fit_transform(X_js)
js_scores = []
js_clust = []
for i in range(1, max_clust):
    kmeans = KMeans(n_clusters=i, random_state=42, n_init="auto").fit(X_js_trans)
    score = kmeans.inertia_
    js_scores.append(score)
    js_clust.append(i)
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
In [251]:
sns.scatterplot(x = js_clust, y = js_scores)
Out[251]:
<AxesSubplot: >
In [252]:
kmeans_js_fin = KMeans(n_clusters=16, random_state=42, n_init="auto").fit(X_js_trans)
js_labels = kmeans_js_fin.predict(X_js_trans)
all_gens['js_cluster'] = js_labels
C:\Users\alexl_g8yj9pc\anaconda3\envs\pokemon\lib\site-packages\sklearn\cluster\_kmeans.py:1382: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=4.
  warnings.warn(
In [253]:
for_pair = ['hp','attack','defense','spa','spd','speed','stat_total','js_cluster']
sns.pairplot(all_gens[for_pair], hue = 'js_cluster', corner = 'True', plot_kws={'alpha':0.1})
Out[253]:
<seaborn.axisgrid.PairGrid at 0x1ad2acc8b50>
In [254]:
sns.histplot(all_gens, x = "gen", y = "js_cluster", bins = 50)
Out[254]:
<AxesSubplot: xlabel='gen', ylabel='js_cluster'>
In [257]:
js_centers = kmeans_js_fin.cluster_centers_

js_centroids = []
for i,c in enumerate(js_centers):
    closest = 100000
    closest_j = 9999999
    for j, v in enumerate(X_js_trans):

        dist = np.linalg.norm(v - c)
        if dist < closest:
            closest = dist
            closest_j = j
    js_centroids.append((i,closest_j))

js_centroid_names = []
for clust, idx in js_centroids:
    mon = all_gens.loc[all_gens.id == idx+1] #matching by dexID, not index, so add one
    n =  re.search("(\d?)([a-z]+)\s", str(mon['name']))
    name = n.group()[:-1]
    js_centroid_names.append((clust,name))

js_clust_info = []
for i in range(15):
    cluster = all_gens.loc[all_gens['js_cluster'] == i]
    c,n = js_centroid_names[i]
    clust_size = len(cluster['primary_type'])
    #get most common type
   
    ptype_counts = Counter(cluster['primary_type'])
    ptype_greatest = max(ptype_counts.values())
    mc_ptype = random.choice([item for item, count in ptype_counts.items() if count == ptype_greatest])
    ptype_sig = ptype_greatest / clust_size
    
    stype_counts = Counter(cluster['secondary_type'])
    stype_greatest = max(stype_counts.values())
    mc_stype = random.choice([item for item, count in stype_counts.items() if count == stype_greatest])
    stype_sig = stype_greatest / clust_size
                                     
    stat_total_mean = np.mean(cluster['stat_total'])
    stat_total_sd = np.std(cluster['stat_total'])
    
    hp_mean = np.mean(cluster['hp'])
    hp_sd = np.std(cluster['hp'])
    
    attack_mean = np.mean(cluster['attack'])
    attack_sd = np.std(cluster['attack'])
    
    defense_mean = np.mean(cluster['defense'])
    defense_sd = np.std(cluster['defense'])
    
    spa_mean = np.mean(cluster['spa'])
    spa_sd = np.std(cluster['spa'])
    
    spd_mean = np.mean(cluster['spd'])
    spd_sd = np.std(cluster['spd'])
    
    speed_mean = np.mean(cluster['speed'])
    speed_sd = np.std(cluster['speed'])
    
    js_clust_info.append((i,n,clust_size,mc_ptype,ptype_sig,mc_stype,stype_sig,stat_total_mean,stat_total_sd,
                      hp_mean, hp_sd, attack_mean, attack_sd, defense_mean, defense_sd, spa_mean, spa_sd,
                      spd_mean, spd_sd, speed_mean, speed_sd))
    
js_clust_df = pd.DataFrame(js_clust_info, columns = ['cluster','centroid','clust_size','most_common_ptype','ptype_prop','most_common_stype',
                                                     'stype_prop','stat_total_mean','stat_total_sd','hp_mean','hp_sd','attack_mean', 'attack_sd',
                                                     'defense_mean','defense_sd','spa_mean','spa_sd','spd_mean','spd_sd','speed_mean','speed_sd'])
In [258]:
js_clust_df
Out[258]:
cluster centroid clust_size most_common_ptype ptype_prop most_common_stype stype_prop stat_total_mean stat_total_sd hp_mean ... attack_mean attack_sd defense_mean defense_sd spa_mean spa_sd spd_mean spd_sd speed_mean speed_sd
0 0 bellossom 89 grass 0.191011 none 0.550562 502.865169 37.903245 84.808989 ... 81.415730 18.059707 85.235955 16.198920 99.258427 18.059532 93.157303 17.072811 58.988764 18.909895
1 1 fletchling 90 water 0.211111 none 0.577778 288.966667 36.696488 43.866667 ... 50.922222 13.133695 40.633333 10.234419 46.844444 14.982146 42.022222 10.448793 64.677778 16.627172
2 2 hariyama 17 normal 0.294118 none 0.647059 484.294118 60.184779 170.058824 ... 78.529412 35.633952 51.823529 23.047867 63.294118 22.905855 69.647059 27.118349 50.941176 19.912959
3 3 watchog 44 normal 0.386364 none 0.545455 454.704545 26.468317 65.659091 ... 82.386364 18.976320 71.681818 14.913040 69.977273 18.227712 72.681818 14.183493 92.318182 18.650826
4 4 swampert 86 water 0.197674 none 0.197674 514.174419 33.076090 96.523256 ... 108.302326 21.125907 88.558140 16.629062 76.488372 19.356355 80.813953 12.720165 63.488372 18.416211
5 5 clauncher 100 grass 0.260000 none 0.550000 351.340000 43.591563 56.710000 ... 54.670000 15.297094 63.150000 18.768258 63.620000 17.495588 67.650000 17.470189 45.540000 14.854912
6 6 drednaw 61 normal 0.163934 none 0.540984 492.721311 31.362048 77.360656 ... 113.098361 18.717665 88.098361 18.572591 54.688525 12.731194 75.983607 16.689539 83.491803 17.275609
7 7 incarnate 80 dragon 0.150000 none 0.300000 607.137500 51.145563 91.900000 ... 111.187500 22.008574 90.325000 20.210625 116.150000 22.174366 91.550000 22.720530 106.025000 18.503361
8 8 chimchar 60 fire 0.266667 none 0.450000 304.166667 25.390396 46.400000 ... 54.466667 15.844522 49.050000 16.089774 54.516667 16.070150 49.416667 13.426704 50.316667 16.533291
9 9 swanna 79 water 0.240506 none 0.518987 456.582278 39.021775 69.012658 ... 83.012658 18.311298 62.632911 11.101576 78.772152 18.091854 64.860759 11.654015 98.291139 22.836320
10 10 crustle 41 rock 0.268293 none 0.439024 493.902439 55.973776 79.170732 ... 108.073171 21.725999 136.902439 26.970449 58.170732 17.002012 65.073171 18.584534 46.512195 17.348820
11 11 silicobra 78 ground 0.141026 none 0.653846 329.012821 33.773565 55.474359 ... 72.743590 16.016367 69.051282 19.723666 41.153846 11.285431 48.833333 14.556184 41.756410 15.947177
12 12 palpitoad 43 fire 0.279070 none 0.372093 406.744186 30.980183 71.302326 ... 78.744186 18.215880 58.418605 11.548145 66.860465 15.599018 58.209302 9.064518 73.209302 16.391373
13 13 mismagius 53 psychic 0.169811 none 0.433962 515.698113 38.559234 72.018868 ... 64.566038 12.978225 68.886792 15.555937 109.716981 20.274882 100.886792 19.914592 99.622642 17.047355
14 14 lechonk 63 bug 0.285714 none 0.730159 236.507937 32.273569 46.222222 ... 35.476190 14.386785 41.015873 13.429997 34.428571 12.921883 41.269841 13.902388 38.095238 16.594811

15 rows × 21 columns

shoutout https://typewind.github.io/2017/09/29/radar-chart/

In [283]:
import matplotlib.pyplot as plt
for i in range(15):  
    #labels=np.array(['hp_mean', 'attack_mean', 'defense_mean', 'spa_mean', 'spd_mean', 'speed_mean'])
    labels=np.array(['attack_mean', 'hp_mean', 'spa_mean', 'spd_mean', 'speed_mean', 'defense_mean'])
    stats=js_clust_df.loc[i,labels].values
    #angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False) # Set the angle
    angles = np.array([np.pi/6, np.pi/2, 5*np.pi/6, 7*np.pi/6, 9*np.pi/6, 11*np.pi/6])
    #angles = angles + (2*np.pi - np.pi/6)
    # close the plot
    stats = np.concatenate((stats,[stats[0]]))  # Closed
    angles=np.concatenate((angles,[angles[0]]))
    clean_labels = np.array(['Attack','HP','Sp. Attack','Sp. Defense','Speed','Defense'])
    clean_labels=np.concatenate((clean_labels,[clean_labels[0]])) # Closed
    fig= plt.figure()
    ax = fig.add_subplot(111, polar=True)   # Set polar axis
    ax.plot(angles, stats, 'o-', linewidth=2)  # Draw the plot (or the frame on the radar chart)
    ax.fill(angles, stats, alpha=0.25)  #Fulfill the area
    ax.set_thetagrids(angles * 180/np.pi, clean_labels)  # Set the label for each axis
    title_centroid = [js_clust_df.loc[i,"centroid"]]
    title_tot_stat = [js_clust_df.loc[i,"stat_total_mean"]]
    title = title_centroid + title_tot_stat
    ax.set_title(title)  # Set the pokemon's name as the title
    #ax.set_rlim(0,250)
    ax.grid(True)
    
In [284]:
all_gens.loc[all_gens['js_cluster'] == 13]
Out[284]:
Unnamed: 0 id name weight height primary_type secondary_type hp attack defense spa spd speed gen stat_total cluster js_cluster
5 5 6 charizard 905 17 fire flying 78 84 78 109 85 100 1 534 5 13
37 37 38 ninetales 199 11 fire none 73 76 75 81 100 100 1 505 5 13
63 63 64 kadabra 565 13 psychic none 40 35 30 120 70 105 1 400 21 13
64 64 65 alakazam 480 15 psychic none 55 50 45 135 95 120 1 500 21 13
72 72 73 tentacruel 550 16 water poison 80 70 65 80 120 100 1 515 3 13
93 93 94 gengar 405 15 ghost poison 60 65 60 130 75 110 1 500 7 13
100 100 101 electrode 666 12 electric none 60 50 70 80 80 150 1 490 19 13
120 120 121 starmie 800 11 water psychic 60 75 85 100 85 115 1 520 0 13
121 121 122 mr-mime 545 13 psychic fairy 40 45 65 100 120 90 1 460 31 13
123 123 124 jynx 406 14 ice psychic 65 50 35 115 95 95 1 455 6 13
124 124 125 electabuzz 300 11 electric none 65 83 57 95 85 105 1 490 19 13
134 134 135 jolteon 245 8 electric none 65 65 60 110 95 130 1 525 19 13
5 5 157 typhlosion 795 17 fire none 78 84 78 109 85 100 2 534 5 13
44 44 196 espeon 265 9 psychic none 65 65 60 130 95 110 2 525 21 13
77 77 229 houndoom 350 14 dark fire 75 90 50 110 80 95 2 500 13 13
2 2 254 sceptile 522 17 grass none 70 85 65 105 85 120 3 530 9 13
30 30 282 gardevoir 484 16 psychic fairy 68 65 65 125 115 80 3 518 31 13
32 32 284 masquerain 36 8 bug flying 70 60 62 100 82 80 3 454 4 13
74 74 326 grumpig 715 9 psychic none 80 45 65 90 110 80 3 470 21 13
98 98 350 milotic 1620 62 water none 95 60 79 100 125 81 3 540 10 13
128 128 380 latias 400 14 dragon psychic 80 80 90 110 130 110 3 600 22 13
20 20 407 roserade 145 9 grass poison 60 70 65 125 105 90 4 515 3 13
42 42 429 mismagius 44 9 ghost none 60 60 60 105 105 105 4 495 7 13
80 80 467 magmortar 680 16 fire none 75 95 67 125 95 83 4 540 5 13
81 81 468 togekiss 380 15 fairy flying 85 50 95 120 115 80 4 545 12 13
3 3 497 serperior 630 33 grass none 75 75 95 75 95 113 5 528 9 13
55 55 549 lilligant 163 11 grass none 70 60 75 110 75 90 5 480 9 13
67 67 561 sigilyph 140 14 psychic flying 72 58 80 103 80 97 5 490 21 13
115 115 609 chandelure 343 10 ghost fire 60 55 90 145 90 80 5 520 13 13
121 121 615 cryogonal 1480 11 ice none 80 50 50 95 135 105 5 515 6 13
143 143 637 volcarona 460 16 bug fire 85 60 65 135 105 100 5 550 13 13
154 154 648 meloetta-aria 65 6 normal psychic 100 77 77 128 128 90 5 600 0 13
5 5 655 delphox 390 15 fire psychic 75 69 72 114 100 104 6 534 0 13
18 18 668 pyroar 815 15 fire normal 86 68 72 109 66 106 6 507 18 13
21 21 671 florges 100 11 fairy none 78 65 68 112 154 75 6 552 12 13
28 28 678 meowstic-male 85 6 psychic none 74 48 76 83 81 104 6 466 21 13
45 45 695 heliolisk 210 10 electric normal 62 55 52 109 94 109 6 481 18 13
42 42 764 comfey 3 1 fairy none 51 52 90 82 110 100 7 485 12 13
71 71 793 nihilego 555 12 rock poison 109 53 47 127 131 103 7 570 3 13
8 8 818 inteleon 452 19 water none 70 85 65 125 65 120 8 530 2 13
16 16 826 orbeetle 408 4 bug psychic 60 45 110 80 120 90 8 505 0 13
45 45 855 polteageist 4 2 ghost none 60 65 65 134 114 70 8 508 7 13
59 59 869 alcremie 5 3 fairy none 65 60 75 110 121 64 8 495 12 13
63 63 873 frosmoth 420 13 ice bug 70 65 60 125 90 65 8 475 25 13
66 66 876 indeedee-male 280 9 psychic normal 60 65 55 105 95 95 8 475 18 13
87 87 897 spectrier 445 20 ghost none 100 65 60 145 80 130 8 580 7 13
43 43 949 toedscruel 580 19 ground grass 80 70 65 80 120 100 9 515 15 13
53 53 959 tinkaton 1128 7 fairy steel 85 75 77 70 105 94 9 506 12 13
64 64 970 glimmora 450 15 rock poison 83 55 90 130 81 86 9 525 3 13
79 79 985 scream-tail 80 12 fairy psychic 115 65 99 65 115 111 9 570 12 13
81 81 987 flutter-mane 40 14 ghost fairy 55 55 55 135 135 135 9 570 7 13
88 88 994 iron-moth 360 12 fire poison 80 70 60 140 110 110 9 570 5 13
98 98 1004 chi-yu 49 4 dark fire 55 80 80 135 120 100 9 570 13 13
In [287]:
all_gens.loc[all_gens['js_cluster'] ==7]
Out[287]:
Unnamed: 0 id name weight height primary_type secondary_type hp attack defense spa spd speed gen stat_total cluster js_cluster
144 144 145 zapdos 526 16 electric flying 90 90 85 125 90 100 1 580 19 7
148 148 149 dragonite 2100 22 dragon flying 91 134 95 100 100 80 1 600 22 7
149 149 150 mewtwo 1220 20 psychic none 106 110 90 154 90 130 1 680 21 7
150 150 151 mew 40 4 psychic none 100 100 100 100 100 100 1 600 21 7
91 91 243 raikou 1780 19 electric none 90 85 75 115 100 115 2 580 19 7
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
94 94 1000 gholdengo 300 12 steel ghost 87 60 95 133 91 84 9 550 33 7
96 96 1002 chien-pao 1522 19 dark ice 80 120 80 90 65 135 9 570 17 7
100 100 1006 iron-valiant 350 14 fairy fighting 74 130 90 120 60 116 9 590 29 7
101 101 1007 koraidon 3030 25 fighting dragon 100 135 115 85 100 135 9 670 36 7
102 102 1008 miraidon 2400 35 electric dragon 100 85 100 135 115 135 9 670 36 7

80 rows × 17 columns

In [ ]: